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Abstract 

We present an extension of Eigen's model for quasi-species including the competition among 
individuals, proposed as the simplest mechanism for the formation of new species in a 
smooth fitness landscape. The evolution equation for the probability distribution of species 
has the form of a nonlinear reaction-diffusion equation. We are able to obtain analytically 
an approximation of the critical threshold for the species formation. The comparison with 
the numerical resolution of the original equation is very good. 



1 Introduction 

In this paper we address the problem of formation of 
species in simple ecosystems, possibly mirroring some 
aspects of bacterial and viral evolution. Our model can 
be considered as an extension of Eigen's model |H, 0. 
With respect to the latter, we introduce the interactions 
among individuals. 

The correspondence of this kind of models with real 
biological systems is rather schematic: the (haploid) or- 
ganisms are only represented by their genetic informa- 
tion (the genotype), and we do not consider sexuality 
nor age structure or polymorphism. Moreover, a spatial 
mean field approximation is applied, so that the rele- 
vant dynamical quantity is the distribution of genotypes. 
This distribution evolves under the combined effects of 
selection and mutations. Selection is represented by the 
concept of fitness landscape |g, Q| , a function of the geno- 
type that represents the average fraction of survivors per 
unit of time, and includes the effects of reproductive ef- 
ficiency, survival and foraging strategies, predation and 
parassitism, etc. In other words, the fitness function is 
the evolutive landscape seen by a given individual. Only 
point mutations are considered, and these are assumed 
to be generated by independent Poisson processes. The 
presence of mutations allows the definition of the dis- 
tance between two genotypes, given by the minimum 
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number of mutations required to connect them. Assum- 
ing only point mutations, the genotypic space is a hy- 
percube, each direction being spanned by the possible 
values of each symbol in the sequence. In the Boolean 
case, which is the one considered here, each point muta- 
tion connects two vertices along the axis corresponding 
to the locus where the mutation has taken place. 

In the original work, Eigen and Schuster showed 
that a landscape with a single maximum of the fitness 
allows for a phase transition from a bell-shaped distri- 
bution of the population centered at the location of the 
maximum of the fitness function (the master sequence) 
to a flatter distribution. This error transition is triggered 
by the mutation rate. 

It has been shown g, |[ that Eigen's model is equiv- 
alent to an equilibrium statistical mechanical model of 
interacting spins, the latter being the elements of the 
genome. In this way several evolutionary concepts can 
be mapped to a statistical mechanics language. In partic- 
ular, for a static fitness landscape, the evolution becomes 
the process of optimization of an "energy" function (the 
logarithm of the fitness) , balanced by the entropy. The 
genealogy of a particular genome can be represented as 
a two-dimensional spin system. We refer to the two di- 
rections as the time and the genotypic one, respectively. 
The coupling in the time direction is ferromagnetic and 
is given by the mutations. The coupling in the genotypic 
direction is given by the fitness function and is in general 
long range. While this mapping is suggestive and allows 
a precise characterization of vaguely defined terms, from 



the point of view of numerical and analytical treatments 
of the equations, the original differential equation ap- 
proach is more effective. 

Borrowing the language of statistical mechanics, the 
single sharp maximum case (the one studied originally 
by Eigen and Schuster |g]) can be defined as a degener- 
ate genotypic space, since all individuals but the master 
sequence have the same fitness, and we consider it as 
a particular case of the more general class of genotypic 
spaces in which the fitness depends only on the genotypic 
distance from the master sequence. 

The degenerate landscape can be represented as a lin- 
ear one by introducing the appropriate multiplicity fac- 
tor. Using this approach, one implicitly assumes that all 
degenerate strains are evenly populated, i.e., that there 
exist high transition rates among these strains. This as- 
sumption has been exploited in the study of the phase 
diagram of the single sharp maximum case M . 

Finally, assuming a hierarchy for the relevance of mu- 
tations, one can have a pure linear genotypic space. For 
instance, let us consider Boolean sequences of length L 
and assume that 000 ... is the master sequence. Delete- 
rious mutations 0^1 are assumed to be non-lethal only 
if they accumulate at the ends of the sequence, as (for 
L = ?,) 



f f f ^ f fO ^ 100 ^ 000 <-> 001 ^ oil ^ 111, 



(1) 



where the arrows denote the mutations. One can intro- 
duce a genotypic index — 3 < x < 3 and rewrite eq. (0) 



as 



-3 ^ -2 ^ -1 ^ ^ 1 ^ 2 ^ 3 = -3, 

i.e., we have a linear genotypic space with periodic 
boundary conditions. 

An hypothetical example of such a hierarchical space is 
that of a series of genes that code for enzymes involved in 
a metabolic pathway. A mutation in the first enzyme of 
the sequence is more likely lethal, while a mutation that 
lowers the affinity of the last enzyme with its substrate 
could be easily retained even if the fitness of the individ- 
ual is lowered. This mutation reduce also the specificity 
of the last enzyme with its substrate (which is the prod- 
uct of the previous metabolic step) , allowing a mutation 
in the previous enzyme and so on. 

Almost all the works dealt with abstract landscapes 
(mainly RNA world). The difficulty in applying these 
concepts to real biological systems concerns the defini- 
tion of the fitness function, that relates the genotype to 
the phenotype. In particular, the difficulty resides in pre- 
dicting the stability or the efficiency of a protein given 
its sequence of amino acids. One can circumvent this 
difficulty taking into consideration only the subclass of 
all possible mutations that do not change the protein 
structure. One example of an explicit definition of the 



fitness function is given by the variation of the repro- 
ductive rate of bacteria due to synonymous mutations 
and tRNA usage [|[ || . This study can be considered an 
example of a degenerate smooth maximum fitness land- 
scape. Another explicit biological application concerns 
the evolution of RNA viruses on HeLa cultures ||l^ . In 
this case the fitness landscape was assumed to be linear 
(without multiplicity). 

While in general the fitness landscape depends on the 
presence of others individuals and changes with time, it 
is much simpler to study the problem for a given (static) 
landscape, that can be thought as an approximation 
for diluted, rapidly evolving organisms or self-catalytic 
molecules (RNA world), while all other species remain 
constant. In these static landscapes, all strains are cou- 
pled by the normalization of the probability distribution, 
and for small values of the mutation rate (a situation ful- 
filled in the real case) and smooth landscapes, the fittest 
quasi-species always eliminates all others g] . The global 
coupling given by the normalization of probability corre- 
sponds to the case of finite population size or the alterna- 
tive phases of exponential growth followed by starvation 
and death. For of a rugged static landscape (for instance 
generated by an Hopfield Hamiltonian [|[ ||), the distri- 
bution of species can reflect the distribution of the peaks 
of fitness. In these cases the interesting question con- 
cerns the error transition. 

In this work we address the problem of species forma- 
tion in presence of competition. The idea of our approach 
is the following: we look for a stable probability distri- 
bution formed by separated quasi-species, and for each 
of them we compute the effective fitness landscape due 
to the competition with individuals of the same and all 
other species. Then, the parameters of the distribution 
(position and weight of quasi-species) are obtained in a 
self-consistent way. In this way we are able to compute 
analytically the threshold for species formation transi- 
tion in a linear landscape. 

We shall deal with coupled differential and finite dif- 
ference equations, that can be thought as a mean field 
approximation of a true microscopic model. The effect 
of the finiteness of population, however, should imply 
a cutoff on the tail of the distribution, due to the dis- 
creteness of the individuals, and thus the dependence of 
evolution on the initial condition (for an application of 
the cutoff effect, see Ref. [|o|). We do not consider here 
these effects. 

The sketch of this paper is the following: first of all, 
we formalize the model in section y, then we work out 
the distribution of a quasi-species near a maximum of the 
effective fitness landscape in section 0, and finally we ap- 
ply the self-consistency condition in section |[ comparing 
the analytical approximation with the numerical resolu- 
tion. Conclusions and perspectives are drawn in the last 
section. 



2 The model 

We describe in detail the approximations that lead to 
our model. An individual is identified by its genome, 
represented by an integer index x (no polymorphism nor 
age structure). We study the case of a linear genotypic 
space (hierarchical relevance of mutations). 

We shall not consider here age structure nor the ef- 
fects of polymorphism in the phenotype. For the sake 
of simplicity, we shall deal only with haploid organisms. 
Moreover, we do not consider the spatial structure (spa- 
tial mean field). The experimental setup of reference is 
that of an bacterial population that grows in a stirred liq- 
uid medium, with constant supply of food and removal 
of solution, so that the average size of the population 
is constant. Another possible experiment concerns RNA 
viruses [|0[. 

We consider the distribution p{x,t), that gives the 
probability of observing the strain x at time t within 
the population. We shall denote the whole distribution 
as p{t). At each time step we have 



XI P(^'^) 



1. 



(2) 



Organisms undergo selection, reproduction and muta- 
tion. The reproduction and death rates are represented 
by a fitness function A(x,p(i)), that represents the av- 
erage fraction of individual of a given strain x surviving 
after a time step in absence of mutation for a given prob- 
ability distribution p{t) . 

As usual, we consider only point mutations, and we 
factorize the probability of multiple mutations (i.e., they 
are considered independent events). The rate of muta- 
tion per time step of a single element of the genome is 
yu; each point mutation connects the strain a; to a; -I- 1 or 
x-1. 

Since we want to model existing populations, we deal 
with small mutation rates. In this limit, only one point 
mutation can occur at most during a time step. This 
is the main difference with previous works, in which the 
main goal was to study a mutation-induced phase tran- 
sition (error threshold). 

With these assumptions, the generic evolution equa- 
tion (master equation) for the probability distribution 
is 

a{t)p{x,t + l)= (l + f^^JA{x,pit))p{x,t): (3) 
where the discrete second derivative 6"^ /Sx"^ is defined as 



bx^ 



= .f{x + l) + f{x-l)-2f{x), 



and a(t) maintains the normalization of p{t). In the 
following we shall mix freely the continuous and discrete 
formulations of the problem. 



The numerical resolution of eq. (0) shows that a sta- 
ble asymptotic distribution exists for almost all initial 
conditions. In the asymptotic limit t -^ oo, p{t -I- 1) = 
p{t) = p{x). Summing over x in eq. (y) and using the 
normalization condition, eq. (0), we have: 



a = ^A{x,p)p{x) = A 



(4) 



The normalization factor a thus corresponds to the av- 
erage fitness. The quantities A and a are defined up to 
an arbitrary constant. 

In general the fitness A depends on x and on the prob- 
ability distribution p. The dependence on x includes the 
structural stability of proteins, the efficiency of enzymes, 
etc. This corresponds to the fitness of the individual x 
if grown in isolation. On the other hand, the effective 
fitness seen by an individual depends also on the compo- 
sition of the environment, i.e., on p. This p-dependence 
can be further split into two parts: the competition with 
other clones of the same strain, (intra-strain competi- 
tion) and that with different strains (inter-strains compe- 
tition), disregarding more complex patterns as the group 
structure (colonies). The intra-strain term has the effect 
of broadening the curve of a quasi-species and of lower- 
ing its fitness, while the inter-strains part can induce the 
formation of distinct quasi-species. 

Since A is strictly positive, it can be written as 

A{x,p) =cxp{H{x,p)). 

If A is sufficiently smooth (including the dependence on 
p), one can rewrite eq. (^ in the asymptotic limit, using 
a continuous approximation for x as 



ap = Ap + fi—{Ap), 



(5) 



Where we have neglected to indicate the genotype in- 
dex X and the explicit dependence on p. Eq. (g) has the 
form of a nonlinear diffusion-reaction equation. Since we 
want to investigate the phenomenon of species forma- 
tion, we look for an asymptotic distribution p formed by 
a superposition of several non-overlapping bell-shaped 
curves, where the term non-overlapping means almost 
uncoupled by mutations. Let us number these curves us- 
ing the index i, and denote each of them as Pi{x), with 
p{x) = J2iPii^)- Each Pi{x) is centered around Xi and 
its weight is J piix)dx ~ 7i, with 'Yliili — 1- We fur- 
ther assume that each pi (x) obeys the same asymptotic 
condition, eq. (o) (this is a sufficient but not necessary 
condition). Defining 



Ai = — / A{x)pi{x)dx = a, 



(6) 



we see that in a stable ecosystem all quasi-species have 
the same average fitness. 



3 Evolution near a maximum 

We need the expression of p if a given static fitness A{x) 
has a smooth, isolated maximum for a; = {smooth max- 
imum approximation). Let us assume that 



A{x) ~ Ao{l -ax'^), 



(7) 



where ^o = ^(0)- Substituting q = Ap in eq. (||) we 
have (neglecting to indicate the genotype index x, and 
using primes to denote differentiation with respect to it): 



A 



q = q + fiq 



Looking for q = exp(it;), 



^-I + mK' + u;"), 



and approximating A ^ ^ A^^ (^ + o,^'^): we have 

Aq 
A possible solution is 



(8) 



w{x) 



x 

2^' 



Substituting into eq. (m we finally get 






(9) 



Since a — A, a/Ao is less than one we have chosen the 
minus sign. In the limit afj, —^ (small mutation rate 
and smooth maximum), we have 



a 

To 



a/j, 



and 



(10) 



The asymptotic solution is 
1 + ax'^ 



p{x) = 7 



27ra(l + aa2) ^''^ V 2a^ 



so that J p{x)dx = 7. The solution is a bell-shaped 
curve, its width a being determined by the combined 
effects of the curvature a of maximum and the mutation 
rate ^.. In the next section, we shall apply these results 
to a quasi-species i. In this case one should substitute 
P ^ Pi, 1 ^ li and X ^ X -Xi. 

For completeness, we study also the case of a sharp 
m,axim,um, for which A{x) varies considerably with x. In 
this case the growth rate of less fit strains has a large 
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Figure 1: Average fitness oi/Aq versus the coefficient a, of the 
fitness function, eq. (l7|), for some values of the mutation rate 
^. Legend: numerical resolution corresponds to the numerical 
solution of eq. yj^, smooth maximum refers to eq. (|9|) and 
sharp maximum to eq. (|ll[) 



contribution from the mutations of fittest strains, while 
the reverse flow is negligible, thus 

p{x - l)A{x - 1) > p{x)A{x) > p{x + l)A{x + 1) 



neglecting last term, and substituting q{x) — A(x)p{x) 
in eq. (||) we get: 



^0 



q{x) 



M 



{aA{x) - 1 + 2/x) 



= 1-2^ for a; = (11) 
q{x - 1) for a; > (12) 



Near x — Q, combining eq. dill), eq. (|l^)and eq. (0)), 
we have 



q{x) 



A* 



q{x-l). 



(1 - 2^)aa;2 
In this approximation the solution is 
/i \ 1 



q{x) 



1 - 2/ia/ (x!) 



.1^2^ 



and 



y{x) = A{x)q{x) ~ ^(1 + ax^) 



1 



aa I xl'^ 



We have checked the validity of these approximations 
numerically solving eq. (0); the comparisons are shown 



in Figure (El) . We observe that the smooth maximum ap- 
proximation agrees with the numerics for for small val- 
ues of a, when A{x) varies slowly with x, while the sharp 
maxim,um, approximation agrees with the numerical re- 
sults for large values of a, when small variations of x 
correspond to large variations of A{x). 

4 Speciation 

Let us now study the stable quasi-species distribution 
for a simple interacting fitness landscape. The fitness 
A{x,p) = exp{H{x,p)) is given by 

H{x,p) = Ha + Hi{x) + H2ix,p) + . . . 

where Hq is an arbitrary constant, Hi(x) is the static 
landscape, i.e., the fitness seen by an individual in iso- 
lation (it includes the interaction with all other slowly 
varying species) and H2{x,p) is the interaction land- 
scape. We examine the case of a single quadratic maxi- 
mum of Hi , using the explicit form: 



Hiix) 



where r gives the amplitude of the quadratic maximum, 
and b is the curvature. For x -^ oo, Hi{x) ~ b{l— \x\/r), 
while for x ^ 0, Hi{x) ~ —hx^jr^. We have checked 
numerically that other similar smooth potentials give the 
same results of this one. 

We assume that the interactions among individuals 
are always negative (competition) and decrease exponen- 
tially with the distance: 




H2{x,p) = - J / exp 



(x 



,/-)piy)dy- 



Numerically solving eq. (g|) we obtain the asymptotic 
probability distribution showed in Figure H. One can 
observe the presence of several non-overlapping quasi- 
species. For i? — > oo, substituting p{x) = X]i-Pi(^)i '^^^ 
has 



H2{x,p) = -jy^7texp 



{X- Xjf 
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Figure 2: Probability distribution with five quasi-species. 
Numerical values are ^l = 0.01, Hq = 1.0, b — 0.2, J — 7.0, 
i? = 10 and r- = 3. 



We consider now the case of three species, two of which 
are symmetric with respect to the dominant one. We 
have Xq = 0, Si = —X2 — x. In the limit /i ^ 0, we can 
consider a — Ao{l — ./afj) = Aq, and thus a = A{0) = 
A{x) (this is a strong approximation which simplifies the 
computation), and 



bx 



rx 7o rx 



Finally, we have the following system 

z2-G(7o-7i)^ + 1-2^-2^=0, 

70 R 

70 + 271 = 1, 
G7ozexp(-zV2) = l, 



(13) 

(14) 
(15) 



where z — x/R and G — Jr/Rb. 

The limit of coexistence for the three species is given 
by 7o = 1 (and thus 71 =0). We compute the critical 
value Gc of G for the coexistence of three species, in the 



limit r/R — > 0. The first order term G, 
computing z from eq. (04) 



(0) 



is obtained 



The location Xk of the maximum of the quasi-species 
k is given by: 



G. 



(0) 



g: 



(0)^ 



dA 
dx 



= 0. 



The species occupies the fittest position xo = 0. For 
k ^ we have (using the large x approximation for Hi): 



b , jsr^ Xk-Xi I (xk - Xi) 
-r+''2:^^^^'^^[ 2R^ 



= 0. 



and inserting this value into eq. (ttq). Solving numer- 



ically this equation, we have G, 



(0) 



2.2160. The first 
correction Gc'''{r/R) is obtained from eq. (f3), and is 



simply G 



(1) 



-r/R. So finally we have for the critical 



threshold of species formation Gc 



Gc = 2.216 



r 
R' 



(16) 




Figure 3: Behavior of Gc versus r/R. The continuous hne 
represents the analytical approximation, eq. (Ilq), the circles 
are obtained from numerical resolution. The error bars rep- 
resent the maximum error. 



We have solved numerically eq. (ra) for different values 
of the parameters, and we have checked that the thresh- 
old of coexistence of the three species depends only on 
G. In particular, this threshold does not depends on the 
mutation rate /i, at least for ^ < 0.1, which is a very 
high mutation rate for real organisms. The most impor- 
tant effect of /i is the broadening of quasi-species curves, 
that can eventually merge. In the range of parameters 
used, G depends only on ratio r/R. Both these results 
are in agreement with the analytical predictions obtained 
above. In Figure ^ we compare the numerical and ana- 
lytical results, plotting the different threshold value Gc 
as function of r/R. 

5 Discussion and conclusions 

We have studied a simple model for species formation. 
This model can be considered an extension of Eigen's 
one |l|, y, with the inclusion of competition, which if the 
fundamental ingredient for species formation in smooth 
landscapes. On the other hand, from an individual's 
point of view and disregarding complex structures such 
as the colonial organization, the more similar the phe- 
notype the more important the sharing of resources and 
thus the competition. Since we assumed a smooth de- 
pendence of phenotype on genotype, we simply modeled 
the competition J{x, y) between the two strains x and y 
by means of a smooth function of the distance between 
two genotypes: J{x,y) = — Jexp(— (x — j/)^/2_R^). In 
this way the strongest competition occurs with other in- 
stances of the same strain, which is reasonable. One can 
interpret our interaction terms as a cluster expansion of a 
long-range potential, in which we retained single and two 
bodies contributions. From the point of view of popula- 
tion dynamics, our form of modeling the competition is 



equivalent to the Verlhust damping term (logistic equa- 
tion). 

In a real ecosystem, however, there could be positive 
contributions to the interaction term J. In particular, it 
can happen that J{x, y) > and J{y, x) < 0) (predation 
or parassitism), or J{x,y) > and J{y,x) > 0) (coop- 
eration) . An investigation on the origin of complexity in 
random ecosystems is in progress. In particular we want 
to study the effects of time fluctuation of fitness (say 
due to human interaction) on the number of coexisting 
species. 

We have studied the effects of competition in a linear 
(i.e., hierarchic) genotypic space. Our results synthesize 
in Figure y. The dependence of the threshold for the 
formation of quasi-species obtained analytically from our 
approximations reflects very well the numerical results. 
We also checked that the latter does not depend on the 
mutation rate /z, up to /i = 0.1. 
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